Deep Sentence-Level Authorship Attribution

نویسندگان

  • Stephen Macke
  • Jason Hirshman
چکیده

We examine the problem of authorship attribution in collaborative documents. We seek to develop new deep learning models tailored to this task. We have curated a novel dataset by parsing Wikipedia’s edit history, which we use to demonstrate the feasiblity of deep models to multi-author attribution at the sentence-level. Though we attempt to formulate models which learn stylometric features based on both grammatical structure and vocabulary, our error analysis suggests that our models mostly learn to recognize vocabulary-based cues, making them non-competitive with baselines tailored to vocabulary-based features. We explore why this may be, and suggest directions for future models to mitigate this shortcoming.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Deep Context Grammatical Model For Authorship Attribution

We define a variable-order Markov model, representing a Probabilistic Context Free Grammar, built from the sentence-level, delexicalized parse of source texts generated by a standard lexicalized parser, which we apply to the authorship attribution task. First, we motivate this model in the context of previous research on syntactic features in the area, outlining some of the general strengths an...

متن کامل

News Authorship Identification with Deep Learning

Authorship identification identifies the most possible author from a group of candidate authors for academic articles, news, emails and forum messages. It can be applied to find the original author of an uncited article, to detect plagiarism and to classify spam / nonspam messages. In this project, we tackled this classification task in author level, article level, sentence level and word level...

متن کامل

Syntactic Stylometry: Using Sentence Structure for Authorship Attribution

Most approaches to statistical stylometry have concentrated on lexical features, such as relative word frequencies or type-token ratios. Syntactic features have been largely ignored. This work attempts to fill that void by introducing a technique for authorship attribution based on dependency grammar. Syntactic features are extracted from texts using a common dependency parser, and those featur...

متن کامل

Deep Level Lexical Features for Cross-lingual Authorship Attribution

Crosslingual document classification aims to classify documents written in different languages that share a common genre, topic or author. Knowledge-based methods and others based on machine translation deliver state-of-the-art classification accuracy, however because of their reliance on external resources, poorly resourced languages present a challenge for these type of methods. In this paper...

متن کامل

Patterns of local discourse coherence as a feature for authorship attribution

We define a model of discourse coherence based on Barzilay and Lapata’s entity grids as a stylometric feature for authorship attribution. Unlike standard lexical and character-level features, it operates at a discourse (cross-sentence) level. We test it against and in combination with standard features on nineteen booklength texts by nine nineteenth-century authors. We find that coherence alone...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015